-
Notifications
You must be signed in to change notification settings - Fork 837
[wasm-split] Export/Import only necessary elements #8221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
In `ModuleSplitter::shareImportableItems`, we now check if each module element is used by a secondary module and export & import it only when it is necessary. This removes the need to run RemoveUnusedModuleElements pass for secondary modules at the end, and also removes unnecessary exports from the primary module in case they are not used in any secondary modules. This reduces the running time on a 'acx_gallery' reproducer provided by @biggs0125 before from 78.3s to 16.4s in average on my machine, reducing it by around 79%. In that 'acx_gallery' program, this PR reduces the size of the primary module by 8% and reduces the all combined module size (primary + secondaries) by 3.7%. --- Detailed analysis for 'acx_gallery', a case where we split a module into 301 (1 primary + 300 secondary) modules: - Before this PR: Time: 78.3s Task breakdown: ``` Task Total Time (ms) Percentage --------------------------------------------------------------------------- shareImportableItems 34472.0000 49.90% removeUnusedSecondaryElements 24720.2000 35.78% moveSecondaryFunctions 4938.0500 7.15% writeModule_secondary 2892.7603 4.19% writeModule_primary 897.3630 1.30% exportImportCalledPrimaryFunctions 667.9780 0.97% indirectReferencesToSecondaryFunctions 201.9830 0.29% indirectCallsToSecondaryFunctions 133.2190 0.19% classifyFunctions 118.3200 0.17% setupTablePatching 42.6402 0.06% thunkExportedSecondaryFunctions 0.9307 0.00% initExportedPrimaryFuncs 0.8008 0.00% --------------------------------------------------------------------------- Overall Total ``` - After this PR: Time: 16.4s Task breakdown: ``` Task Total Time (ms) Percentage --------------------------------------------------------------------------- moveSecondaryFunctions 5341.1700 43.94% writeModule_secondary 2960.8319 24.36% shareImportableItems 1568.6400 12.91% writeModule_primary 765.7240 6.30% exportImportCalledPrimaryFunctions 762.9670 6.28% indirectReferencesToSecondaryFunctions 362.3300 2.98% classifyFunctions 204.4540 1.68% indirectCallsToSecondaryFunctions 139.2110 1.15% setupTablePatching 45.2461 0.37% thunkExportedSecondaryFunctions 3.0373 0.02% initExportedPrimaryFuncs 0.9982 0.01% --------------------------------------------------------------------------- Overall Total 12154.6095 100.00% ``` We can see that `shareImportableItems`, which took up the largest share, has been reduced significantly, and `removeUnusedSecondaryElements`, which was the second largest, is not necessary anymore.
src/ir/module-splitting.cpp
Outdated
|
|
||
| #define DELEGATE_FIELD_NAME_KIND(id, field, kind) \ | ||
| if (cast->field.is()) { \ | ||
| if (kind == ModuleItemKind::Global) { \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A switch here might be better than an if-else chain. If nothing else, it would give us a compiler error if we ever add a new ModuleItemKind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done: 8642d46
| NameCollector collector(used); | ||
| for (auto& global : secondary.globals) { | ||
| if (!global->imported()) { | ||
| collector.walk(global->init); | ||
| } | ||
| } | ||
| for (auto& segment : secondary.dataSegments) { | ||
| used.memories.insert(segment->memory); | ||
| if (segment->offset) { | ||
| collector.walk(segment->offset); | ||
| } | ||
| } | ||
| for (auto& segment : secondary.elementSegments) { | ||
| if (segment->table.is()) { | ||
| used.tables.insert(segment->table); | ||
| } | ||
| if (segment->offset) { | ||
| collector.walk(segment->offset); | ||
| } | ||
| for (auto* item : segment->data) { | ||
| collector.walk(item); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of this except checking for tables and memories in active segments could be replaced with a call to collector.walkModuleCode().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
|
||
| for (auto& table : primary.tables) { | ||
| auto secondaryTable = secondary.getTableOrNull(table->name); | ||
| if (!secondaryTable && !used.tables.count(table->name)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this condition different from the others? Maybe worth a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done: 5e7b376
| (type $0 (func)) | ||
| (import "primary" "%a" (func $0 (exact (type $0)))) | ||
| (import "primary" "%c" (func $1 (exact (type $0)))) | ||
| (import "primary" "%d" (func $2 (exact (type $0)))) | ||
| (import "primary" "already_exported" (func $3 (exact (type $0)))) | ||
| (import "primary" "%e" (func $4 (exact (type $0)))) | ||
| (import "primary" "%f" (func $5 (exact (type $0)))) | ||
| (import "primary" "%g" (func $6 (exact (type $0)))) | ||
| (import "primary" "%b" (func $7 (exact (type $0)))) | ||
| (import "primary" "%h" (func $8 (exact (type $0)))) | ||
| (import "primary" "%i" (func $9 (exact (type $0)))) | ||
| (func $call (type $0) | ||
| (call $0) | ||
| (call $1) | ||
| (call $2) | ||
| (call $3) | ||
| (call $4) | ||
| (call $5) | ||
| (call $6) | ||
| (call $7) | ||
| (call $8) | ||
| (call $9) | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few test cases in this file that have new functions in secondary modules. This is because we don't run RemoveUnusedModuleElements on secondaries anymore. In practice this wouldn't be a problem because if a function is not used anywhere it would have been optimized away before splitting.
tlively
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
This reverts commit 51e3151.
|
Sorry, I pushed code for time measuring by mistake. Reverted. |
|
Updated more code size stats in the PR description. |
`moveSecondaryFunctions`, despite the name, actually copies functions to secondary modules and removes them from the primary module. This prevents this overhead by actually moving the functions to secondary modules. This reduces the running time on acx_gallery (provided by @biggs0125) by 31%. --- - Before this PR: Time: 16.4s Task breakdown: ``` Task Total Time (ms) Percentage --------------------------------------------------------------------------- moveSecondaryFunctions 5341.1700 43.94% writeModule_secondary 2960.8319 24.36% shareImportableItems 1568.6400 12.91% writeModule_primary 765.7240 6.30% exportImportCalledPrimaryFunctions 762.9670 6.28% indirectReferencesToSecondaryFunctions 362.3300 2.98% classifyFunctions 204.4540 1.68% indirectCallsToSecondaryFunctions 139.2110 1.15% setupTablePatching 45.2461 0.37% thunkExportedSecondaryFunctions 3.0373 0.02% initExportedPrimaryFuncs 0.9982 0.01% --------------------------------------------------------------------------- Overall Total 12154.6095 100.00% ``` - After this PR: Time: 11.3s ``` Task Total Time (ms) Percentage --------------------------------------------------------------------------- writeModule_secondary 2908.4728 43.52% shareImportableItems 1562.7800 23.39% exportImportCalledPrimaryFunctions 786.6140 11.77% writeModule_primary 751.3420 11.24% indirectReferencesToSecondaryFunctions 250.7260 3.75% indirectCallsToSecondaryFunctions 149.7940 2.24% classifyFunctions 126.4520 1.89% moveSecondaryFunctions 106.0680 1.59% setupTablePatching 38.9292 0.58% initExportedPrimaryFuncs 0.8816 0.01% thunkExportedSecondaryFunctions 0.4937 0.01% --------------------------------------------------------------------------- Overall Total 6682.5533 100.00% ``` We can see the time spent in `moveSecondaryFunctions`, which used to take nearly half of the running time (after WebAssembly#8221), is now negligible.
In
ModuleSplitter::shareImportableItems, we now check if each module element is used by a secondary module and export & import it only when it is necessary. This removes the need to run RemoveUnusedModuleElements pass for secondary modules at the end, and also removes unnecessary exports from the primary module in case they are not used in any secondary modules.This reduces the running time on a 'acx_gallery' reproducer provided by @biggs0125 before from 78.3s to 16.4s in average on my machine, reducing it by around 79%.
In that 'acx_gallery' program, this PR reduces the size of the primary module by 13% and its export section by 62.1%, and reduces the all combined module size (primary + secondaries) by 7.4%.
Detailed analysis for 'acx_gallery', a case where we split a module into 301 (1 primary + 300 secondary) modules:
Time: 78.3s
Task breakdown:
Time: 16.4s
Task breakdown:
We can see that
shareImportableItems, which took up the largest share, has been reduced significantly, andremoveUnusedSecondaryElements, which was the second largest, is not necessary anymore.Code size
Primary module's export section size: -62.1%
With the name section present
With the name section stripped from the result